Goto

Collaborating Authors

 global minimizer


A Missing statements and proofs 521 A.1 Statements for Section 3.1

Neural Information Processing Systems

Let a two-player Markov game where both players affect the transition. As we have seen in Section 2.1, in the case of unilateral deviation from joint policy Let a (possibly correlated) joint policy ˆ σ . By Lemma A.1, we know that Where the equality holds due to the zero-sum property, (1). An approximate NE is an approximate global minimum. An approximate global minimum is an approximate NE.


Transformers learn to implement preconditioned gradient descent for in-context learning

Neural Information Processing Systems

Several recent works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate iterations of gradient descent.





Imbalance Trouble: Revisiting Neural-Collapse Geometry

Neural Information Processing Systems

Towards this end, we adoptthe unconstrained-features model (UFM), a recent theoretical model for studying neural collapse, and introduce Simplex-Encoded-Labels Interpolation (SELI) as an invariant characterizationof theneuralcollapsephenomenon.